% Chapter 2. Section 3.

\section{The Compiler - GCC}
\label{gcc}
Probably everyone has heart of GCC and thinks that it's an acronym standing for GNU C Compiler. Well it's not the whole truth. GCC is not just a C compiler. Let's see what it is really and what it does for the software developers.

\subsection{What the compiler does}
The definition of the term "compiler" is not really well defined. One would call a compiler an application which translates the source code written in a high level programming language to binary code. Someone would say that the compiler translates the code just to assembler language. Anyway one thing can be said for sure: the compiler translates the source code from a higher level language to a lower level one. Here we shall discuss aspects of C/C++ compilers.

Usually compilers consist of several components: \textit {preprocessor, translator, assembler, linker, optimizer, archiver,} etc. Each of this tools does its part of the whole job. 

At first the preprocessor acts. It finds out all the preprocessor directives in the source code (lines starting with an "\#" symbol) and tries to execute the specified command (e.g. \#include or \#define). For example the \#include directive is executed in the following way: it opens the file, which should be included at that place and processes it (thus executes the directives in that included file). When all the nested files are processed the included file is physically copied at the place where earlier the \#include directive was. After the source code is preprocessed there will be no directives there any more. We can say that on out we shall get the code in "pure C++". 

On that "pure C++" code the translator can be executed. It translates all the high level entities of C++ to corresponding commands in assembler language. Usually a single line of code in C++ can be translated into several assembler commands. That makes the resulting code to be not really human-readable.

Next the assembler module comes to the scene. It takes the code in assembler language and translates to object code. Each source code file is translated to a single object file. These files contain information about functions and variables defined in the file and the "external references" (thus calls of functions or references to variables from outside).

Then the linker combines all the object files and libraries to build the binary code. Here it tries to find all the functions that have been called within the source code. Then it searches for the program entry point - the function called "main". If all the references and the entry point were found then the binary code can be generated.

Later the code optimizer analyzes the binary code on optimization. Usually on this step some parts of the original source code can be removed (e.g. non-accessible parts in conditional operators). As a result the binary code gets smaller and works faster after optimization step.

In case when we are building library archives instead of the linker we should call the archiver after we have got the object files (see section \ref{why_libraries} for details). It combines them into a single archive which can be later linked to other programs (in fact the linker unpacks the archive and extracts the object files it needs to link).

\subsection{GNU GCC compiler}
The GNU Compiler Collection (GCC) is the most widely used free compiler. It is a huge project consisting of several compilers for C, C++, Fortran, Pascal, Objective-C, Java, and Ada. Each compiler comes with it's complete tool chain. GCC is now ported to different architectures and platforms. It is probably the most powerful tool for the software developers.

It comes with almost all the GNU/Linux distributions pre-installed and it's main executable component is called \textit {gcc}. Execute the command \textit { gcc -v} to ensure that GCC is installed and get it's version and some basic information. To emphasize that we are building a code written in C++ run \textit { g++ <file\_name>}.

\subsubsection{Controlling the build process}
As you could guess GCC contains all the components described earlier. In fact several of them are really implemented as separate applications. It makes possible to go through the build process in details. For example you can stop the build on the first step and analyze the code in "pure C++". Maybe it does not make sense for huge products, but in our case it can be really useful to see what is happening to our code during the build process.

GCC supports several options for controlling the build process. For example to stop the execution after the preprocessor has processed the source file, the compiler must be executed with \textit { -E} option. Let's take an example to work with: 

\lstinputlisting[label=gcc_example,caption=An example to build,language=c++]{Sources/chapter2/gcc_example.cpp}

To get this code preprocessed execute the following command: \textit { g++ -E gcc\_example.cpp}. On output you'll get the following:

\lstinputlisting[label=gcc_example_preprocessed,caption=After 	preprocessing...,language=c++]{Sources/chapter2/gcc_example_preprocessed.cpp}

Here it is demonstrated how the \textit { \#define} and \textit { \#if ... \#endif} directives are processed. It's not reasonable to bring an example of \textit { \#include} directive, because usually it enlarges the file for hundreds of times. But still you can try it yourself.

The result of preprocessing can be passed to the compiler again. Or we can reuse the code from listing \ref{gcc_example}. Let us execute the compiler and stop after the assembler code is generated. It can be achieved via the \textit { -S} option. Execute the following command: \textit { g++ -S gcc\_example.cpp}. There will be no printed output from this command. Instead of that you can find a newly created file called \textit { gcc\_example.s} which contains the assembler code.

To generate the object code execute the compiler with \textit { -c} option. On the output you'll get a new file called \textit { gcc\_example.o}. You cannot analyze the contents of the generated file using an ordinary text editor. The file almost does not contain any textual data. Instead of it you can use a special tool for object files analysis: \textit { objdump}. Execute the following command: \textit { objdump -t gcc\_example.o}. From the printed output we can find out that there is a function called \textit { "main"} defined in the \textit { .text} section of that file. And we also find out that there is an undefined function (thus there is a function call) called \textit { \_\_gxx\_personality\_v0}. This information will later be used by the linker.

The linker operates on the object files. First it ensures that there is a file containing the \textit { main} function among the files passed to it. If the mentioned function is found then it is assumed to be the entry point of the program. Then the linker tries to find all the functions which are called from the processed files. In the search process the linker also processes the libraries which are passed additionally to it (more information on this topic can be found in \ref {inc_lib_pkgconfig}). If all the references are resolved then the binary file can be generated.

\subsubsection{Include paths, libraries, pkg-config}
\label {inc_lib_pkgconfig}
Usually software developers include some header files and call functions from some libraries not really caring where the compiler finds the headers and libraries from. Here we'll see how that process goes on and how to control the compiler behavior.

Almost every application written in C++ uses the \textit { cout} object which is defined in a namespace called \textit { std} which is located in a file called \textit { iostream} (please note that the file is not called \textit { iostream.h} as you probably are used to). To be able to use the mentioned object we write down the following preprocessor directive: \textit { \#include <iostream>}. We may not know where the file is really situated in the file system. But obviously someone does. 

In all the GNU/Linux distributions all the C/C++ header files are located in the \textit { /usr/include} folder and it's subfolders. So when you include a file the preprocessor searches in the system default folder. Thus when you want to include a file from somewhere else you should clearly point to the folder where the file can be found

This can be achieved by passing an option to the compiler: \textit { -I}. So let's say we want to call a function which is defined in a file called \textit { myfunc.h} located in \textit { /home/me/include}. The corresponding preprocessor directive would be: \textit { \#include <myfunc.h>}. But the compile command would be the following: \textit { g++ -I/home/me/include my\_file.cpp}. In this case the preprocessor will also search for the included file in the mentioned folder (besides the default folder). If you don't want the default folder to be processed you can pass the \textit { -nostdinc} option to the compiler.

The case is a little bit more complicated when you want to use a non-standard library. Usually the language standard library is automatically linked during the build process. In case when you use the C language the \textit { libc} library is linked. In case C++ the library is called \textit { libstdc++}. For example the object \textit { cout} is defined in the \textit { libstdc++}.

The system library files in GNU/Linux distributions are usually stored in the \textit { /lib} or \textit { /usr/lib} folders. These folders are processed by the linker by default. If you want to link a library which is situated in another folder you should use the \textit { -L} option. Usually along with this option you'll need to pass the library file name to link. All the libraries are named in the following way: \textit { libXXX.so}. For example the POSIX Threads library file is called \textit { libpthread.so}. So it is really enough to pass the name of the library without the prefixing \textit { "lib"} and trailing \textit { ".so"}. For example if your program uses a function from the library \textit { libmylib.so} which is in the \textit { /home/me/lib} folder you should invoke the following command: \textit { g++ myfile.cpp -L/home/me/lib -lmylib}. 

There are two types of library files: dynamically and statically linked. The first ones are called \textit {Shared objects} while the others are usually called  \textit {Archives}. By default the linker searches for shared objects with appropriate name. Sometimes it's not the correct way to have shared objects linked to the program. In this case developers should directly tell the linker to try to make static linking passing the \textit { -static} option. Thus the following command will try to build the program discussed earlier but now with static linking: \textit { g++ myfile.cpp -L/home/me/lib -lmylib -static}. 

Unfortunately libraries change their names (at least the version part of the name) and developers cannot be sure what is the name of the file necessary for their program to built and run. This problem is also caused by the differences between the different GNU/Linux distributions. For example the folder which contains the header files of the \textit { GTK+ version 2.0} toolchain can be either \textit { gtk+2.0} or \textit { gtk+-2.0} or even \textit { gtk-2}. So how the software developers can be sure concerning the file name on the current distribution? The problem is not only with the file name itself. The library can be installed in a custom location (let's say in \textit { /usr/local} instead of the default \textit { /usr} prefix).

All these problems can be solved using a very useful tool - \textit { pkg-config}. In the \textit { /usr/lib/pkgconfig} folder you can find a lot of files with \textit { .pc} extension. They are installed along with the libraries and contain all the necessary information about the library. To get information about the header files location of some library (let's say \textit { GTK+ 2.0}) you can execute the following command: \textit { pkg-config --cflags gtk+-2.0}. It will print out all the options needed to be passed to the compiler in order to be able to use GTK+ header files.

To get a complete information about the library files (shared objects and archives and their locations) use the following command: \textit { pkg-config --libs gtk+-2.0}. This information is quite enough to be passed to the linker if you want to use the mentioned library.

So putting all together the given information we come to the following idea: usually the compilation commands look like this:

\textit { g++ myfile.cpp  `pkg-config --cflags --libs somelib` }

This command includes a sub-command which prints out all the necessary information about the library called \textit { somelib} and that information is passed directly to the compiler.
